Using MineSet for Knowledge Discovery

نویسنده

  • Barry G. Becker
چکیده

Trying to apply a visualization to data without prepro-cessing can often lead to incomprehensible results because of cluttered and confused displays. A solution is to use data mining to extract just the interesting results and to visualize them. There are two basic ways to carry out the mining: supervised and unsupervised. For the supervised case, the user has a particular goal in mind, while in the second case the user is performing a global search for interesting patterns. MineSet provides two classifiers for goal directed learning. A classifier is a model for predicting one attribute of a set of data given the values for several other attributes. Prior to inducing a classifier a user must select one of the attributes as a label (the goal). The classifier is constructed based on a training set where the label values are known to be accurate. Once built, the model may be applied to new data where the label is unknown. The Evidence Visualizer displays the classifier structure produced by running an evidence inducer on the serv-er. Figure 4 shows the Evidence Visualizer being used to determine data attributes correlate with earning more than $50,000 in salary. Each attribute is a row of pie charts. Each pie chart represents an interval or value of the attribute. The attributes are listed in order of usefulness for predicting the label (salary in this case). The height of the pies show the number records having a particular value. Interaction with the scene allows the user to ask questions like 'what is the probability that someone earns over $50,000 given that their education is High School graduate and their occupation is professional specialty?'. The Tree Visualizer is used to display the decision tree structure produced by a decision tree inducer. The attribute shown at the root of the tree is the one determined to be the most important. After the root split, branches are further split based on other attributes until further splits are deemed to be insignificant statistically. At this point leaf nodes are close to containing only one label value. In figure 5 one can see that odor is the most important attribute for determining whether or not a mushroom is poisonous. It completely determines the edibility, except when there is no odor, in which case stalk shape is used to discriminate further. A common commercial application for this is determining loan approvals based on …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

D2.2.5 MineSet TM

D2.2.5.1 Abstract MineSet is a commercial data mining product from Silicon Graphics. It provides an interactive platform for data mining, integrating three powerful technologies: database and file access, analytical data mining engines, and data visualization. MineSet supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deploy...

متن کامل

D 2 . 2 . 5 MineSet

D2.2.5.1 Abstract MineSetTM is a commercial data mining product from Silicon Graphics. It provides an interactive platform for data mining, integrating three powerful technologies: database and file access, analytical data mining engines, and data visualization. MineSet supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to depl...

متن کامل

MineSet: An Integrated System for Data Mining

MineSet TM , Silicon Graphics' interactive system for data mining, integrates three powerful technologies: database access, analytical data mining, and data visualization. It supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment. MineSet is based on a client-server architecture that scales to large databases. The dat...

متن کامل

MineSet(tm): A System for High-End Data Mining and Visualization

MineSetTM is a highly integrated suite of client-server tools for the high-end mining and visualization of very large enterprise databases. MineSet represents the confluence of several important software and hardware technologies: data mining algorithms, fast multiprocessing database servers, novel techniques for interactive 3-D data visualization, and powerful graphics workstations. MineSet pr...

متن کامل

Cluster Based Cross Layer Intelligent Service Discovery for Mobile Ad-Hoc Networks

The ability to discover services in Mobile Ad hoc Network (MANET) is a major prerequisite. Cluster basedcross layer intelligent service discovery for MANET (CBISD) is cluster based architecture, caching ofsemantic details of services and intelligent forwarding using network layer mechanisms. The cluster basedarchitecture using semantic knowledge provides scalability and accuracy. Also, the mini...

متن کامل

Designing an Ontology for Knowledge Discovery in Iran’s Vaccine

Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Computer Graphics and Applications

دوره 17  شماره 

صفحات  -

تاریخ انتشار 1997